AI Hallucinations: A Misnomer Worth Clarifying

2025-07-14
출판일: 2024-01-09
저자: Negar Maleki, Balaji Padmanabhan, Kaushik Dutta

AI 환각이라는 용어의 문제점, 비일관적인 범주화 등을 체계적 문헌 검토를 통해 분석. 검토 기간은 2023년 10월 초까지.

Abstract

As large language models continue to advance in Artificial Intelligence (AI), text generation systems have been shown to suffer from a problematic phenomenon termed often as “hallucination.” However, with AI’s increasing presence across various domains including medicine, concerns have arisen regarding the use of the term itself. In this study, we conducted a systematic review to identify papers defining “AI hallucination” across fourteen databases. We present and analyze definitions obtained across all databases, categorize them based on their applications, and extract key points within each category. Our results highlight a lack of consistency in how the term is used, but also help identify several alternative terms in the literature. We discuss implications of these and call for a more unified effort to bring consistency to an important contemporary AI issue that can affect multiple domains significantly.

arxiv.org/abs/2401.06796v1

Introduction

AI 분야에서 환각은 원래 좋은 의미로 쓰였으나 점차 부정적 의미로 바뀜. (참고: Hallucinating faces)

(Hallucination) was associated with constructive implications such as super-resolution, image inpainting, and image synthesis. Interestingly, in this context hallucination was regarded as a valuable asset in computer vision rather than an issue to be circumvented.

아직까지 널리 합의된 명확한 정의가 없음.

의학 분야에서의 환각은 특정한 감각적 경험을 뜻함. 하지만 AI는 “감각 경험”을 하지 않으며 “환각”을 부정적 의미로 사용하는 건 실제 환각을 겪고 있는 사람들에게 낙인을 찍는 부작용이 있음.

이 연구에서는 다양한 분야에서 “AI Hallucination”이라는 용어를 어떻게 쓰고 있는지 분석.

Methodology

PubMed, Google Scholar 등 14개 데이터베이스에 대한 체계적 문헌 검토를 수행. 기준을 통과한 모든 문헌을 하나씩 검토하여 AI 환각에 대한 정의를 찾아냄. 검토 기간은 2023년 10월 초까지.

Result

다양한 용어를 쓰고 있었고, 같은 용어를 쓰더라도 정의가 다른 경우가 많았으며 어떤 정의는 서로 상충하기도 함.

ChatGPT가 공개된 2022년 11월 30일 이전까지는 컴퓨터과학 이외 분야에서 AI 환각이라는 용어가 거의 쓰이지 않았으나 이 시점을 기준으로 폭발적으로 증가.

다양한 용어들:

Confabulation: AI generated responses that sound plausible but are, in fact, incorrect.
Delusion: AI generated responses that are false.
Stochastic Parroting: 1) The repetition of training data or its patterns, rather than actual understanding or reasoning. 2) LLM model generates confident, specific, and fluent answers that are factually completely wrong.
Factual Errors: Inaccuracies in information or statements that are not in accordance with reality or the truth, often unintentional but resulting in incorrect or misleading information.
Fact Fabrication: The occurrence where inaccurate information is invented, not represented in the training dataset, and is presented lucidly.
Fabrication: 1) The phenomenon where, as a Generative AI, ChatGPT generates outputs based on statistical prediction of the text without human-like reasoning, potentially resulting in plausible-sounding but inaccurate responses. 2) The phenomenon in ChatGPT output where the text is cogent but not necessarily true.
Falsification and Fabrication: Definition was not provided.
Mistakes, Blunders, False-hoods: Answers that are fabricated when data are insufficient for an accurate response.
Hasty Generalizations, False Analogy, False Dilemma: AI models making inferences that do not follow from the premises; also “hasty generalizations,” i.e., the fallacy of making (too) strong claims based on (too) limited data.

Discussion

AI 환각을 줄이는 건 중요한 문제이나 아직 해법은 요원하다. 해법을 찾는 과정 중 하나로, 일단은 개념과 용어를 정리하면 좋겠다. 앞으로 더 많은 논의가 필요할 것.